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Abstract —We show that Reed-Muller codes achieve capacity 
under maximum a posteriori bit decoding for transmission over 
the binary erasure channel for all rates 0 < R < 1. The proof 
is generic and applies to other codes with sufficient amount of 
symmetry as well. The main idea is to combine the following 
observations: (i) monotone functions experience a sharp threshold 
behavior, (ii) the extrinsic information transfer (EXIT) functions 
are monotone, (iii) Reed-Muller codes are 2-transitive and thus 
the EXIT functions associated with their codeword bits are all 
equal, and (iv) therefore the Area Theorem for the average EXIT 
functions implies that RM codes’ threshold is at channel capacity. 
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I. Introduction 

Reed-Muller (RM) codes JT)-[|4] are among the oldest 
codes in existence, and due to their many desirable properties, 
are also among the most widely studied. In recent years there 
has been renewed interest in RM codes, partly due to the 
invention of capacity-achieving polar codes 0, which are 
closely related to RM codes. For a performance comparison 
between polar and RM codes, see 0, 0. Simulations and 
analytical results suggest that RM codes do not perform well 
under successive and iterative decoding, but they outperform 
polar codes under maximum a posteriori (MAP) decoding 
0, 0- Nevertheless, it is not known whether RM codes 
themselves are capacity-achieving except for rates approaching 
0 and 1 over the binary erasure channel (BEC) and the binary 
symmetric channel (BSC) 0. 

In this paper, we show that RM codes indeed achieve the 
capacity for transmission over the BEC for any rate R £ (0,1). 
The same result was shown independently by Kumar and 
Pfister Qol using essentially the same approach. 

II. Main Result 

Let RM(n, r) denote the Reed-Muller (RM) code of block 
length N = 2 n and order r, see 0- This is a linear code 
of rate R = A (") anc ^ minimum distance d = 2 n ~ r , 

generated by all rows of weight at least 2 n ~ r of the Hadamard 
matrix (\ ^ )*", where ® denotes the Kronecker product. Let 
[A] = {1,..., A} denote the index set of codeword bits. For 
i £ [A], let Xi denote the ith component of a vector x, and 
let x^i denote the vector containing all components except 


Xi. For x,y £ {0, 1} jV , we write x -k y if y dominates x 
component-wise, i.e. if Xi < yi for all i £ [IV]. 

Let BEC(e) denote the binary erasure channel with erasure 
probability e. Recall that this channel has capacity 1 — e 
bits/channel use. In what follows, we will fix a rate R for a 
sequence of RM codes and show that the bit error probability 
of the code sequence vanishes for all BECs with capacity 
strictly larger than R, i.e., erasure probability strictly smaller 
than 1 — R. 

Theorem 1 (RM Codes Achieve Capacity on the BEC): 
Consider a sequence of RM(n, r„) codes of increasing n and 
rate R n converging to R, 0 < R < 1. For any 0 < e < 1 - II 
and any 6 > 0 there exists an no such that for all n > no 
the bit error probability of RM(n, r n ) is bounded above by S 
under bit-MAP decoding. 

The only property of RM codes that has a bearing on the 
following proof of Theorem|T]is that these codes exhibit a high 
degree of symmetry, and in particular, that they are invariant 
under a 2-transitive group of permutations on the coordinates 
of the code 0, CD, ED- In fact, this proof also shows that 
all 2-transitive sequences of codes are capacity-achieving. We 
will return to this point in Section |TTT] 

Lemma 1 (RM Codes Are 2-Transitive): For any a, b, c, 
and d £ [A] s.t. a ^ b and c ^ d, there exists a permutation 
7 r : [A] —> [A] such that 

(i) 7r(a) = c, Tt(b) = d, and 

(ii) RM (n, r ) is closed under the permutation of its 

codeword bits according to n. That is, 

(x \,..., xn) £ RM ( n , r) 

t ( 1 ) 

(2br(i), • ■ .,aV(iV)) e RM(n,r). 

The 2-transitivity of the code implies many symmetries 
that will be critical in the proof, which we outline here. We 
will be interested in MAP decoding of the ith codebit x t from 
observations y^i, that is, all channel outputs except yi. The 
error probability of the ith such decoder for transmission over 
a BEC(e) is called the ith EXIT function Ifl3l Lemma 3.74], 
which we denote by hi(e). We will see that all A EXIT 
functions of an RM code (and of any 2-transitive code) are 
identical, and also that erasure patterns that lead to decoding 


errors under this decoder exhibit a high degree of symmetry. 
These symmetries will imply that the EXIT functions have 
a sharp threshold behavior, i.e., the bit error probability is 
very small below a threshold, and very large above. A final 
and crucial benefit of considering this suboptimal decoder and 
EXIT functions instead of the optimal block-MAP decoder is 
the well-known Area Theorem ini-iia, which will allow us 
to show that the threshold is at channel capacity and conclude 
the proof. 

Recall the basic definition of an EXIT function lfT3l Lemma 
3.74] and its relation to bit-MAP decoding. 

Definition 1 (EXIT Function): Let C[N,K\ be a binary 
linear code of rate R = K/N and let X be chosen with 
uniform probability from C[N, K). Let Y denote the result of 
letting X be transmitted over a BEC(e). The EXIT function 
hi(e) associated with the 7th bit of C is defined as 

/4(e) = H(Xi | y„0- ( 2 ) 

Lemma 2 (EXIT Function and Bit-MAP Decoding): Let 
C[N,K\ be a binary linear code and let x MAP (y^i) denote the 
MAP estimator of the ith code bit given the observation y^ t . 
Then, 

/4(e) = JP0r MAP C^) =?)• (3) 

The most relevant property of EXIT functions for our 
purpose is the Area Theorem, see El-Ell- 

Lemma 3 (Area Theorem): Let C[N, K] be a binary linear 
code, and let h(e) = j? ^fio' ^ l i( € ) be the average EXIT 
function. Then, 

h(x) dx = ±H(X | Y), 

where H(X \ Y) is the conditional entropy of the codeword 
X given the observation Y at the receiver. In particular. 



We now show that the erasure patterns that lead to decoding 
failures are monotone and symmetric. Recall that the decoding 
of each bit relies only on TV — 1 received bits. We will denote 
each erasure pattern by a binary vector of length TV — 1, where 
a 1 denotes an erasure and a 0 denotes a non-erasure. We first 
characterize the set il l that leads to a decoding failure for bit i. 

Definition 2 (Eli): Given a binary linear code C[TV, K ], let 
Eli be the set that consists of all u> £ {0, l}^ -1 for which 
there exists c £ C such that c* = 1 and c^ r A iu. 

Lemma 4 (Eli Encodes hfie)): Let ui £ (0, l}^ -1 be the 
erasure pattern on the received bits y^. Then the 7th bit-MAP 
decoder fails if and only if u £ il l . Consequently, if //, (■) is 
the measure on {0, l}^ -1 that puts weight e w (l — 
on a point of Hamming weight w, then 

hi(e) = fJ, e {Eli). 

That is, Eli “encodes” the EXIT function of the 7th position. 

Proof: Since the code is linear and the channel is symmet¬ 
ric and memoryless, we can assume that the all-zero codeword 
was transmitted. Given an erasure pattern ui, let C denote the 


set of all codewords c that are compatible with the observation 

2 /_j, i.e., all codewords for which c^i A ui. Note that since 

the code is linear, so is C'. This implies that if there exists 
a c £ C with Ci = 1, then half of all codewords in C have 
a 0 at position i, and the other half have a 1, and thus the 
bit-MAP decoder fails to decode bit i. On the other hand, if 
there is no c £ C' with c, = 1, then all compatible codewords 
have a 0 at position i, and thus the bit-MAP decoder succeeds. 
That is, Eli is the set of all erasure patterns s.t. the bit-MAP 
decoder cannot decide on position i given the observation y^i- 
The claim that hfie) = y t (EIf) follows immediately, since the 
memorylessness of the channel implies that an erasure pattern 
ui occurs with probability /r e (w). ■ 

Lemma 5 (Eli is Monotone): If ui £ Eli and ui A ui', then 
ui f £ Eli. 

Proof: If ui £ Eli, then there exists a codeword c so that 
Ci = 1 and 0^4 A ui. Since by assumption ui A u/, it follows 
that Cr^i A ui', which implies oj' £ Eli. ■ 

Lemma 6 (Eli is Symmetric): If C[TV, K] is a 2-transitive 
binary linear code, then Eli is invariant under a 1-transitive 
group of permutations for any i £ [TV]. Following lfl7l . we 
say that Eli is symmetric. 

Proof: Since C is 2-transitive, for any ji. j -2 £ [TV] \ {7}, 
there exists a permutation n : [TV] -A [TV] so that 

• 7T (7) = i, 

• 7r(ji)=j2, 

• (c,r(i), - ■ ■, C W (JV)) 6 C for any (a, ...,c N )£C. 

Let Si : [TV — 1] A- [TV] \ {7} be defined as Si(k) = k for 
k £ {1, ■ ■ ■ ,7—1} and S\(k) = fc + 1 for k £ {7, • ■ • , TV — 1}. 
Let S 2 : [TV] \ {7} -A [TV — 1] be defined as S-fik) = k for 
k £ {1, ■ ■ • ,7—1} and 62 (L) = k — 1 for k £ {7+1, • • ■ , TV}. 
Consider the permutation 7r : [TV — 1] —[TV — 1] defined as 
77 "(At) = S 2 (7r(5i(fc))). Note that, by changing the choice of j\ 
and 72 , we generate the 1 -transitive group of permutations on 
[TV —1]. It then suffices to show that if ui = (u>i, ■ ■ ■ ,uin- 1 ) £ 
Eli, then (w*(i), ■ • • , w*(jv-i)) £ Eli- 

Recall that ui £ Eli if there exists a codeword c = 
(ci,...,cjv) £ C so that a = 1 and c^i A ui. 
By construction of it, we have that (c^^),..., c 7 r ( jv)) £ 
C and, in addition, c^u) = Ci = 1 . By con¬ 

struction of 7 T, (Gt( 1 )j‘** 3 — 1 ) , Gr(i-(-l) 3 3^7r(AT)) A 
(^7r(t) 3***3 )• ^s a result, (cu^i), • • • , i)) C Eli 

and the proof is complete. ■ 

We now show that all EXIT functions of a 2-transitive code 
are identical. 

Lemma 7 (hi is Independent of i): If C[N,K] is a 2- 
transitive binary linear code, then hfie) = hj(e) for all 
7, j £ [TV]. That is, hfie) is independent of 7. 

Proof: Since C is 2-transitive, there exists a permutation 
7 r : [TV] -A [TV] so that 

• tt(7) = j, 

• (Gr(i)i ■ • • ,c„(N)) eC for any (c lt ...,c N ) £ C. 

Let Si : [TV — 1] —> [TV] \ {7} be defined as Si(k) = k for 
k £ {!,••• ,7—1} and Si(k) = k + 1 for k £ (7, • • • , TV — 1}. 


Let Sj : [iV] \ {j} —> [TV — 1] be defined as Sj(k) = k for 
k G {1, ■ ■ • , j — 1} and Sj(k) = k— 1 for k G {j + 1, ■ • • , N}. 
Consider the permutation if : [A^ — 1] —>- [TV — 1] defined as 
?f(fc) = Sj(Tr(Si(k))). 


is upper bounded by <5 for all i G [N] and e < e. In order to 
conclude the proof, it suffices to show that e is close to 1 — R. 
Note that by definition of e, the area under hfe) is at least 
equal to 


Pick to G flj. Then, there exists a codeword c so that 
Cj = 1 and c^j -< io. By construction of 7 r, we have that 
(c^i), ..., c w (jv)) G C and, in addition, c n ^ = Cj = 1. By 
construction of if, * * * :C 7 r(Aq) 

(^7r(l))’ ; — 1) )• As ‘t result, ) G 

n,. 

With an abuse of notation, let us define 

7 t(%) = {(w*(t),-" ,w* (A r„ 1) ) : u G H,}. 

Then, the previous argument implies that 7 r(fij) C H,. 

It is clear that, if to u/, then (w^m, • ■ ■ , w^at-i)) 7 ^ 
( w *(i)’''' ’ w *(tv-i))- I n d ee d> if w 7 ^ w, then there exists 
an index k s.t. Wfc ^ and, therefore, 10 ^^) ^ u;L fc \. In 
addition, the permutation if leaves the weight of ui unchanged. 
As a result, we have 

(b) 

hj (e) = p e (^j) = At e (7r(f2j)) < = fti(e), (4) 

where (a) comes from the fact that the channel acts indepen¬ 
dently and identically on each component, and (b) follows 
from 7 f(f 1j) C fli. By repeating the same argument with the 
indices i and j exchanged, we obtain opposite inequality and, 
therefore, the thesis follows. ■ 

We recall here the main ingredient for our proof, due to 
Friedgut and Kalai. We note that Tillich and Zemor applied 
the following theorem in ED to show that every sequence 
of linear codes of increasing Hamming distance has a sharp 
threshold under block-MAP decoding for transmission over the 
BEC and the BSC. 

Theorem 2 (Sharp Threshold - lf/71/): Let O g {0,1}^ be 
a symmetric monotone set, where symmetry and monotonicity 
are defined as in Lemma [5] and [ 6 ] respectively. If p, e (Q) > 5, 

then /j,g(fl) > 1 — 5 for e = e + c *^^ , where c is an absolute 
constant. 

Proof of Theorem [7} Consider a sequence of codes 
RM(n,r n ) with rates converging to R. That is, the nth code 
in the sequence has a rate R n < R + S n , where S n —> 0 as 
n —> 00 . 

Lemma [7] implies that hfe) is independent of i, and, thus, 
it is equal to the average EXIT function h(e). Therefore, by 
Lemma [3 we have 

hi(e) d e = R n < R + S n . 


On the other hand, this area is at most equal to R + 5 n . 
Combining these two inequalities we obtain 


e> 1 — R — 6 — 


Sn. - < 


lQ g(^) 

log(iV — 1) ’ 


(5) 


We see that e can be made arbitrarily close to 1 - R by 
picking 8 sufficiently small and N sufficiently large. That is, 
the bit error probability can be made arbitrarily small at rates 
arbitrarily close to 1 — R. ■ 


III. Generalizations and Discussion 

As mentioned above, the foregoing arguments hold for all 
2-transitive codes, and not just RM codes. That is, all such 
codes are capacity achieving over the BEC under bit-MAP 
decoding. This includes, for example, the class of extended 
BCH codes ( (3 Chapter 8.5, Theorem 16]). 

RM codes are only one possible family of codes that can 
be derived from the Hadamard matrix. It is reasonable to 
assume that any subset of generators of sufficient weight from 
the Hadamard matrix will produce good codes. It would be 
interesting to see if such a statement can be proved. Clearly, 
the symmetries of RM codes that are used here will not be 
present in general. 

Perhaps of even greater interest is whether RM codes 
achieve capacity on general binary-input memoryless output- 
symmetric channels and if the above technique can be ex¬ 
tended. Note that it suffices to prove that RM codes achieve 
capacity for the BSC since (up to a small factor) the BSC is the 
worst channel, see fl9l pp. 87-89]. Most of the notions that 
we used here for the BEC have a straighforward generalization 
(e.g., GEXIT functions replace EXIT functions) or need no 
generalization (2-transitivity). However, it is currently unclear 
if the GEXIT function can be encoded in terms of a monotone 
function. It is likely that different techniques will be needed 
to show sharp thresholds for GEXIT functions. 

One of the main motivations for studying RM codes is their 
superior empirical performance (over the BEC) compared with 
the capacity-achieving polar codes. By far the most important 
practical question is whether this promised performance can 
be harnessed at low complexities. 


Consider the set Hi defined in Definition [3 that encodes 
hi(e). By Lemmas [3 and [6] O, is monotone and symmetric. 
Therefore, from Lemma [3 we have that if hi (e) = 1 — 8, then 


Me) <6 for; = ^ + e lo g°^->!) 
constant. 


where c is an absolute 


Now, the function hi(e) is increasing, and therefore by 
Lemma 2, the error probability of the ith bit-MAP decoder 
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